Apache Iceberg | ifkarsyah

Projects

Kadita — Config-Driven Data Ingestion Platform

A Kubernetes-inspired YAML-configured data platform that ingests from Postgres, MySQL, MongoDB, Jira, Zendesk, and S3 into an Apache Iceberg data lake.

Data Lake Apache IcebergS3Python

↗

Blog Posts

Nov 17, 2024

Iceberg Series, Part 6: Multi-Engine & Maintenance

Querying Iceberg from Trino, Flink, and DuckDB; expiring snapshots; rewriting data files; and keeping Iceberg tables healthy in production.

Data Lake Apache Iceberg

→

Nov 10, 2024

Iceberg Series, Part 5: Row-Level Operations

How MERGE, UPDATE, and DELETE work in Iceberg — copy-on-write vs merge-on-read, when to use each, and the performance trade-offs.

Data Lake Apache Iceberg

→

Nov 3, 2024

Iceberg Series, Part 4: Hidden Partitioning & Evolution

Partition transforms that derive partition values automatically, partition evolution that changes strategy without rewriting data, and why these are Iceberg's biggest ergonomic wins.

Data Lake Apache Iceberg

→

Oct 27, 2024

Iceberg Series, Part 3: Catalogs

How Hive, Glue, REST, and Nessie catalogs coordinate multi-engine access to Iceberg tables — and why the catalog abstraction is Iceberg's biggest differentiator.

Data Lake Apache Iceberg

→

Oct 20, 2024

Iceberg Series, Part 2: Table Format Internals

The four-layer metadata hierarchy — table metadata, manifest lists, manifest files, and data files — and how it enables efficient scans and snapshot isolation.

Data Lake Apache Iceberg

→

Oct 13, 2024

Iceberg Series, Part 1: Getting Started

Creating Iceberg tables with Spark, reads, writes, MERGE, time travel, and inspecting table history.

Data Lake Apache Iceberg

→

Oct 6, 2024

Iceberg Series, Part 0: Overview

What is Apache Iceberg, how does it differ from Delta Lake and Hudi, and why multi-engine interoperability is its defining advantage.

Data Lake Apache Iceberg

→